Profile Clusters Derived from BLOCKS Suggest a Simple Model of Column Evolution in Multiple Alignments of Protein Families
نویسندگان
چکیده
BLOSUM and PAM series of protein substitution matrices are popular tools for scoring protein pairwise and multiple alignments. For protein multiple alignments there exists another representation of an evolving column, based on a set of predefined frequency profile patterns. For conserved sites, these profile patterns represent stationary points in a 20-dimensional profile space. There are 20 such patterns. All of them were derived from the BLOCKS database by applying a special clusterization procedure to frequency profiles obtained from BLOCKS alignment columns. To understand the nature of these clusters, random protein sequences were generated where all columns were obtained from a single amino acid type by applying transition probabilities to it, and the same clusterization procedure was applied to the generated frequency profiles. Similar twenty clusters were obtained. This means that for conservative columns, all amino acids in that column are derived from a single ancestor amino acid by a substitution random process with standard transition probabilities. For non-conservative columns there are, generally, no regularities in the amino acid types present therein. Based on the COG database, a formula was obtained to distinguish between functionally important and unimportant columns. If the odds ratio of likelihoods of the most probable ancestral amino acid to the third most probable ancestral amino acid exceeds a critical value, then this column is predicted to be conservative and it is the result of evolution of the ancestral amino acid. Otherwise, this column is considered to be a “garbage” column. When building a consensus sequence from a multiple alignment, we can represent this column as a “garbage” symbol having zero value with any amino acid in a substitution matrix.
منابع مشابه
The use of structure information to increase alignment accuracy does not aid homologue detection with profile HMMs
MOTIVATION The best quality multiple sequence alignments are generally considered to derive from structural superposition. However, no previous work has studied the relative performance of profile hidden Markov models (HMMs) derived from such alignments. Therefore several alignment methods have been used to generate multiple sequence alignments from 348 structurally aligned families in the HOMS...
متن کاملA generalization of Profile Hidden Markov Model (PHMM) using one-by-one dependency between sequences
The Profile Hidden Markov Model (PHMM) can be poor at capturing dependency between observations because of the statistical assumptions it makes. To overcome this limitation, the dependency between residues in a multiple sequence alignment (MSA) which is the representative of a PHMM can be combined with the PHMM. Based on the fact that sequences appearing in the final MSA are written based on th...
متن کاملA Transition Probability Model for Amino Acid Substitutions from Blocks
Substitution matrices have been useful for sequence alignment and protein sequence comparisons. The BLOSUM series of matrices, which had been derived from a database of alignments of protein blocks, improved the accuracy of alignments previously obtained from the PAM-type matrices estimated from only closely related sequences. Although BLOSUM matrices are scoring matrices now widely used for pr...
متن کاملBlocks+: a non-redundant database of protein alignment blocks derived from multiple compilations
MOTIVATION As databanks grow, sequence classification and prediction of function by searching protein family databases becomes increasingly valuable. The original Blocks Database, which contains ungapped multiple alignments for families documented in Prosite, can be searched to classify new sequences. However, Prosite is incomplete, and families from other databases are now available to expand ...
متن کاملwebPRC: the Profile Comparer for alignment-based searching of public domain databases
Profile-profile methods are well suited to detect remote evolutionary relationships between protein families. Profile Comparer (PRC) is an existing stand-alone program for scoring and aligning hidden Markov models (HMMs), which are based on multiple sequence alignments. Since PRC compares profile HMMs instead of sequences, it can be used to find distant homologues. For this purpose, PRC is used...
متن کامل